►
From YouTube: CGD Seminar Series - Claire Monteleoni
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
But
as
katie
mentioned,
I've
been
working
on
climate
change
due
to
the
climate
informatics
through
the
due
to
the
threat
of
climate
change
and
the
extreme
storms
and
natural
disasters
that
we've
been
seeing
and
their
threats
to
communities
and
ecosystems,
and
this
line
of
research
is
based
on
a
vision
that
machine
learning
can
help
shed
light
on
climate
change
and
so
yeah.
Just
for
folks.
A
So
please
get
involved
and
trying
to
argue
to
people
in
the
ai
and
machine
learning
field
about
six
years
ago.
The
way
we
sort
of
broke
down
where
we
thought
machine
learning
might
help
and
was
already
helping,
was
questions
of
paleoclimate
reconstruction
right
in
some
sense.
For
us,
this
is
a
very
large,
sparse
matrix
in
space
and
time
where
you
have
some
paleo
proxy
data
that
goes
back
really
far
in
time,
but
only
in
a
few
sparse
set
of
spatial
locations.
A
A
More
generally
speaking,
as
I'm
sure
all
of
you
know,
because
you're
intimate
with
a
lot
of
the
data
and
the
modeling
is
that
it's
a
huge
playground
around
spatiotemporal
data
and
the
sort
of
non-linear
dependencies
that
we
might
have
over
both
time
and
space.
A
A
But
again,
this
was
six
years
ago
when
we
were
trying
to
carve
out
the
space
for
machine
learning,
but
we've
we've
also
seen
a
bunch
of
other
applications
and,
as
we
get
more
broadly
towards
climate
change
mitigation,
there's
things
like
collaborating
with
nrel
to
try
to
robustify
forecasts
of
solar
output
on
the
order
of
minutes
is
something
that
we
started
to
do
as
well.
A
But,
generally
speaking,
you
know,
I'm
sure
you
guys
have
problems
that
aren't
on
this
slide,
where
I
believe
machine
learning
could
probably
have
an
impact,
and
so
I
look
forward
to
the
discussion.
A
A
So
I
think
I
only
reliably
have
time
for
one
case
study,
and
so
I'm
gonna
do
it
on
downscaling,
because
that's
relatively
general,
the
example
of
downscaling
we've
shown
in
both
temperature
and
precipitation
if
time,
I'll
I'll
talk
about
an
avalanche
detection
task,
which
is
an
instance
of
anomaly
detection,
meaning
there's
severe
class
imbalance.
So
this
is
a
rare
event
and
in
in
our
setting
we
had
a
ground
survey,
but
only
a
very
limited
amount
of
the
survey
data,
with,
of
course,
plenty
of
satellite
data.
A
A
So
I
generally
kind
of
start
like
to
start
with
an
idea
to
hold
or
sort
of
like
a
punch
line,
but
this
is
just
jumping
into
the
details
and
for
people
who
for
whom
this
doesn't
make
sense.
Don't
worry
we'll
go
through
the
case
studies,
but
if
you
are
already
familiar
about
what
deep
learning
is
basically
doing,
I
want
to
make
the
jump
from
supervised
to
unsupervised.
A
So
I'd
like
to
just
boil
down
my
entire
network
of
weights
and
activations
to
this
w,
representing
all
the
parameters
that
are
going
to
be
learned
from
data
and
the
way
a
neural
network
is
trained.
Is
that
you
apply
your
whole
network
f
w
to
an
input
example
say
an
image
or
like
a
vector
or
tensor
of
data
and
make
some
output.
It
could
be
a
scalar
or
a
vector
we'll
call
that
y-hat,
but
training
in
the
machine.
Learning
context
is
trying
to
do
some
sort
of
gradient
descent
on
a
loss
function.
A
This
is
sort
of
penalizing
how
well
your
network
output
y
hat
approximates
the
correct
label
for
your
input
x.
So
if
you
know
what
the
ground
truth
target
should
be
for
input
x,
we
would
call
that
little
y
and
then
given
many
input
output
pairs.
Where
y
is
the
desired
output,
we
can
train.
A
We
can
fit
the
parameters
w.
So
how
do
we
do
that?
We're
going
to
do
a
stochastic
gradient
descent
on
the
loss
function?
Remember!
The
loss
function
is
comparing
the
network's
output
to
the
target
label,
the
ground
truth
desired
output
on
input
x
and
then
just
via
the
chain
rule
for
taking
derivatives,
we'll
get
what
the
incremental
updates
should
be
to
every
single
network
weight,
no
matter
the
architecture.
A
So
that's
when
you
have
a
label
and
I'm
trying
to
argue
that
this
is
not
too
dissimilar
from
the
case
where
you
don't
have
a
label
and
if
you
don't
even
know
what
label
means,
because
it's
some
jargon
you
know
you
might
just
be
in
the
case
where
you
don't
really
have
a
label,
meaning
you
don't
have
enough
ground
truth
examples
of
sort
of
input,
output
pairs,
but
that's
okay.
A
So
you
have
data
and
we'll
still
have
our
network
parametrized
by
all
the
parameters
and
kind
of
architecture,
choices
that
I'm
indicating
by
f
of
w
and
so
now
I'll
just
call
the
network
output
x
hat
because
there's
no
independent
or
there's
no
label
y.
So
that
means
we
can
write
down
a
loss
function
or
some
objective
function
that
you're
trying
to
minimize
via
training
but-
and
that's
perfectly
fine
just
that
now
that
loss
function
can
only
depend
on
the
network's
input
and
the
network's
output.
There's
no
external
information.
A
Why?
But
assuming
you
write
down
such
a
loss
function,
then
everything
else
follows
similarly
you're
doing
gradient.
Descent
to
you
know
to
push
the
output
of
the
network
more
similar
to
be
more
similar
to
the
input
in
some
way
or
you've
defined
a
loss.
A
Here
you
do
gradient
descent
on
that
and
that'll
tell
you
how
to
update
all
the
network
parameters,
so
this
should
hopefully
be
extraordinarily
freeing,
because,
usually
the
bottleneck
is
how
much
data
you
have
where
you
know
what
the
ground
truth
label,
like
classification,
for
example,
should
be,
and
so
this
should
hopefully
allow
you
to
think
more
creatively
about
how
machine
learning
can
be
applied
in
your
in
your
science.
A
One
form
that
I'm
not
going
to
talk
about
today
of
of
loss
function
relates
to
clustering,
so
objectives
for
doing
exploratory
data
analysis,
hierarchical
clustering,
k,
center
clustering,
a
lot
of
spectral
clustering.
All
of
these
have
mathematical
objectives
that
you
can
write
down.
That
only
depend
on
the
input
data
so
as
we
consider
either
an
entire
training
data
set
or
many
batches.
A
Usually
we
would
think
of
it
as
a
compact
representation,
so
we
can
think
of
it
as
automatic
feature
extraction,
and
so
we
can
think
of
this
machine
which
is
sort
of
a
useless
stupid
machine
that
we
don't
think
we
would
really
need,
because
it's
almost
doing
this
noisy
version
of
an
identity
function,
we're
going
to
give
it
input
here.
It's
gray,
scale
images,
so
each
pixel
lives
in
grayscale
and
the
images
were
generated
by
humans,
like
writing,
addresses
at
the
post
office
and
so
we're
trying
to
do
digit
recognition.
A
But
the
training
is
that
loss
that
compares
the
input
of
the
network
to
the
output
of
the
network,
as
we
showed
on
the
previous
slide
and
tries
to
push
them
close
together,
and
we
do
gradient
descent
on
that
loss
to
learn
the
weights
in
these
blocks.
You
don't
really
have
to
know.
What's
going
on
other
than
that,
there
are
parameters
that
we
fit
here
and
so
at
the
end,
we
we
get
this
stupid
machine
that
tries
to
make
an
input
like
an
output
and
even
has
some
loss
or
error.
A
But
this
is
what's
called
a
pretext
task
so
that
we've
learned
this
kind
of
noisy
identity
function
for
as
a
pretext
to
instead
look
at
what
we
got
in
the
bottleneck
is
that
if
we've
sent
the
data
in
in
training
through
a
lower
dimension,
then
we
might
get
some
sort
of
compact
representation
now
of
the
whole
data
distribution
that
we
trained
on.
A
That's
that's
the
high
level
idea,
and
this
is
the
idea
behind
an
auto
encoder,
and
so
typically,
if
your
input
is
at
some
dimension,
which
is
also
the
dimension
of
your
output,
when
you're
trying
to
generate
images-
or
you
know,
data
similar
to
the
input-
then
we
might
think
of
this
latent
representation
as
a
compact
representation
of
the
data.
A
Again,
these
are
all
the
network
parameters
that
we
are
going
to
estimate
while
trying
to
minimize
the
reconstruction
error
between
output
and
input-
and
I
would
say,
for
the
most
part
in
climate
and
other
applications
that
you
guys
care
about.
You
would
be
looking
for
a
compact
representation
because
that's
you
know
some
way
to
summarize
your
data
or
to
extract
features
automatically,
but
I
will
just
mention-
and
this
is
this
is
sort
of
curious,
but
cognitive,
scientists
or
vision.
A
Scientists
believe
that
in
human
cognition,
we're
doing
something
similar,
but
we're
sending
the
data
through
a
wide
layer
that
is
actually
higher
dimensional
than
the
input
and
output
dimension
and
actually
from
neuroscience.
There's
evidence
of
this
in
in
other
systems
like
fruit
flies
olfactory
systems,
so
they
kind
of
map
the
smells
that
they
receive
as
input
into
a
higher
dimensional
space
than
the
sensory
input
space
and
the
point
of
doing
that
you're,
certainly
less
compact
in
dimension
than
the
input.
But
within
that
over
complete
representation.
A
The
goal
is
that
the
representations
will
then
be
quite
sparse,
which
helps
you
to
distinguish
different.
You
know,
sense
tokens,
so
that's
in
cognitive,
science.
So
vision
people
might
use
wide
or
over
complete,
auto
encoders.
A
So
the
first
idea
of
a
variational
autoencoder
was
to
fit
a
gaussian
distribution,
so
you'll
start
with
some
simple
prior,
but
then
you
could
actually
learn
a
full
covariance
structure.
A
Well
here
this
is
actually
spherical,
but
you
could
learn
a
full
gaussian
distribution
over
latent
representations
and
and
then
shortly
I'll
talk
about
the
next
step
away
from
this,
where
it
doesn't
even
have
to
be
gaussian
so
anyway,
that's
sort
of
basic
things
that
I'm
gonna
touch
on
in
unsupervised.
Learning,
as
I
mentioned,
my
group
is
getting
really
excited
around
unsupervised
learning
and
has
done
some
work
on
avalanche
detection.
I
probably
won't
have
time
to
get
to
that.
A
Maybe
at
the
end,
I'm
mostly
going
to
talk
about
a
downscaling
task,
and
here
it's
an
unsupervised
method,
whereas
the
previous
one
was
semi-supervised.
It
was
sort
of
an
unsupervised
plus
supervised
pipeline
for
this
one,
it's
unsupervised,
but
I'm
going
to
talk
about
self-supervision
briefly,
as
my
interpretation
of
why
this
actually
works
so
well
in
practice.
A
So
brian,
who
I
actually
think
was
a
a
summer
intern
at
some
lab
in
encar
a
couple
of
years
back
did
a
master's
here
in
computer
science
and
then
graduated
from
his
master's
thesis
and
went
on
to
a
climate
science
lab
in
potsdam
like
albert
wagner
or
something
but
anyway
this
this.
This
first
project,
I'm
going
to
talk
about,
was
his
master's
thesis.
A
So
you
know
for
this
audience.
I
don't
need
to
distinguish
too
much
what
I
mean
about
downskilling.
I
will
warn
you,
though,
as
when
you,
when
you
go
out
to
get
software
packages
from
machine
learning
and
computer
vision
about
the
terminology,
so
sometimes
we'll
talk
about
up
sampling
or
super
resolution,
meaning
actually
downscaling
so
up
and
down
get
reversed.
So,
let's
be
very
clear:
we
want
to
use
coarse
scale,
space,
geotemporal
data
fields
to
infer
values
at
finer
scales.
A
Actually,
ultimately,
the
method
that
we
provide
is
symmetric,
so
you
can
also
go
from
fine
to
core
scale
if
that
would
be
needed
and,
of
course,
there's
a
whole
field
on
this.
From
our
perspective,
thinking
phrasing
this
using
you
know,
machine
learning
jargon,
is
that
the
methods
that
we
saw
were
supervised
learning
methods.
A
You
would
need
to
see
the
field
at
a
course
scale,
and
the
corresponding
field
at
fine
scale
for
and
you'd
have
need
to
have
many
of
these
paired
instances
where
you
get
both
coarse-scaled
and
fine-scale
data
paired
in
order
to
train
your
method,
so
we'd
like
to
move
away
from
that
first
off.
Secondly,.
A
They
would
tend
to
provide
point
predictions,
meaning
once
trained.
You
would
give
such
a
method
a
you
know,
a
map
of
your
variable
at
the
course
scale,
and
it
would
output
one
instantiation
at
fine
scale
when
we
would
actually
prefer
and
we're
calling
that
a
point
prediction,
even
though
it's
the
whole
map
at
finer
scale.
We
would
actually
prefer
a
distribution
over
such
maps
at
fine
scale
and
so
we're
going
to
call
that
generative
downscaling,
so
brian's
approach
to
generative
downscaling
was
using
a
sort
of
problem.
A
Definition
in
machine
learning
called
domain
alignment
as
well
as
some
very
recent
approaches,
addressing
domain
alignment
and
deep
unsupervised
learning
so
self-supervision
is
when
there
is
some
kind
of
usually
temporal
or
spatial
underlying
structure
that
connects
your
data,
so
that
you
don't
need
external
labels.
So
my
my
interpretation
of
why
this
technique
actually
did
work
so
well
for
generative
downscaling
is
while
we
didn't
need
to
train
on
any
paired
images.
A
A
Okay,
so
not
only
can
you
sort
of
map
between
data
at
different
spatial
scales,
so
in
this
case
one
degree,
lat
lawn
box
resolutions
versus
one-eighth
degree,
but
also
you
know
these
data
sets
could
be
quite
different.
So
we
have
re-analysis
data
from
era
right.
A
So
that's
generally
thought
of
as
coming
from
observations,
even
though
it's
smoothed
a
bit
through
models
and
then
nwp
data,
which
is
of
course
from
wharf,
which
is
you
know,
simulated
and
based
on
physics,
and
so
we
did
two
separate
experiments
or
two
separate
studies,
one
for
temperature
and
one
for
precip.
So,
on
the
left
hand
side,
you
have
the
coarse
scale
resolution
for
temperature
and
per
sip
and
then
on
the
right
hand,
side.
A
You
have
the
finer
scale,
but
again
one
is
from
era
reanalysis
and
the
other
is
the
output
of
warf
okay.
So
this
domain
alignment
task
is
saying
that
I
have
two
random
variables
and
what
I'd?
Like
to
learn
is
a
bijection
so
that
if
I
have
samples
from
the
marginal
of
x-
and
I
apply,
this
function
f
I'll
approximate,
the
marginal
of
y
for
two
random
variables.
A
And
similarly,
if
I
have
iid
samples
from
the
marginal
of
y-
and
I
apply
f
inverse-
I
can
approximate
the
marginal
of
x
so
to
clarify
these
x's
and
y's
are
not
I'm
supposed
to
be
example
and
label.
These
are
just
two
different
random
variables
and
for
our
purposes
you
know
one
is
going
to
be
the
course
scaled
distribution
and
one
is
going
to
be
the
fine
grained
scaled
distribution
and
we
don't
need
any
pairings
between
x
and
y.
A
A
So
this
latent
space
that
before
I
was
showing
as
a
distribution
over
those
bottleneck
representations,
but
it's
just
some
space
that
I'm
going
to
learn,
starting
with
some
prior
over
that
space,
which
could
be
very
simple
like
an
isotropic
gaussian,
but
since
ultimately,
I'd
like
to
have
a
joint
distribution,
a
joint
probability
density
function
over
the
data
at
the
two
different
resolutions,
I'm
going
to
add
an
assumption
that
those
two
domains
x
and
y,
of
course,
and
fine
resolution-
are
conditionally
independent,
given
this
latent
space
z.
A
So
when
I've
added
that
assumption,
then
of
course
that
allows
me
to
factor
the
full
joint
between
high
resolution.
You
know
coarse
resolution,
fine
resolution
and
the
latent
space,
which
then,
of
course,
would
allow
me
to
represent
my
joint
and
so
then.
Ultimately,
I
could
sample
conditionally
on
a
coarse
grained
resolution.
I
could
sample
fine
grained
resolutions
or
vice
versa,
or
I
could
take
unconditional
samples
of
the
informative
posterior.
I've
learned
over
the
latent
space
and
then
sample
at
either
lower
high
resolution.
A
So
there
is
that
assumption
of
conditional
independence
over
a
shared
latent
space
and
everything
works.
Subject
to
that
assumption,
then
learning
the
the
conditional
distribution
of
one
space
given
the
latent
space
and
similarly
to
the
other,
we're
going
to
use
a
technique
that
came
out
last
year
in
triple
ai
and
in
terms
of
getting
something
more
informative,
a
distribution
to
be
more
informative
over
the
latent
space.
A
I'm
going
to
talk
on
the
next
slide
about
normalizing
flows,
but
the
punch
line
here
again
is
that
you
might
want
to
do
downscaling
in
your
application,
but
not
have
access
to
paired
images
that
really
correspond
between
the
course
resolution
and
the
fine
resolution,
whereas
you
may
be
able
to
get
samples
at
either
resolution
as
much
as
you
want,
and
so
here
the
pairing
between
those
those
maps
or
fields
is
not
required.
A
A
That
is
fit
by
training
on
your
your
data
distribution,
and
so
you
can
do
so
by
by
composing
invertible
transformations
where
you'll
learn
the
parameters
of
each
of
these
and
that's
called
a
flow,
and
so
the
the
punch
line
here
we're
actually
not
going
to
use
any
of
the
flows
on
this
slide,
we're
going
to
use
an
algorithm
called
glow,
but
we're
going
to
learn
mappings
that
can
be
much
more
informative
than
just
a
gaussian
over
our
latent
space
z.
A
So
one
invertible
mapping
from
one
domain
say
the
course
resolution
domain
and
then
another
one
mapping
between
z
and
the
the
fine
resolution
domain,
and
so
this
architecture
largely
follows
this
paper
a
line
flow
but
we're
using
a
normalizing
flow
called
glow,
which
is
one
by
one
invertible
convolutions.
A
Where
is
the
learning
happening?
The
learning
is
only
happening,
so
parameters
are
only
getting
fit
in
this
glow
step,
which
are
the
parameters
that
will
instantiate
that
composed
and
invertible
mapping
between
one
data
space
and
this
latent
space
and
between
the
other
data
space
and
the
shared
latent
space.
So
we
can
have
a
different
parameter
set
for
each
of
those
invertible
mappings
and
then
that
of
course
yields
a
mapping
between
the
two
different
data
spaces.
A
So
this
latent
dimension
is
neither
wider
nor
com
compact.
It
actually
matches
the
dimension
of
of
our
fine
resolution
data
and
so
we'll
store
our
course
resolution
data
at
the
same
dimension
and
so
to
get
it
up
sampled.
We
simply
just
look
at
neighboring
cells
to
up
sample
it
and
provide
the
minimum
additional
information,
okay,
so
more
information
on
the
machine
learning
architecture
and
then
the
normalizing
flow
or
in
these
two
papers.
A
So
this
was
pretty
new
in
statistical
downscaling,
and
so
we
didn't
have
a
way
to
compare
numerically
to
another
generative
downscaler.
This
is
the
only
one
that
we
know
of
so.
Instead
we
compared
to
point
predictions,
which
means
the
comparisons
had
to
be
run
using
paired
images.
The
tests
were
run
on
images
paired
at
the
high
res
and
lo-res,
and
so
we
could
compare
to
bcsd,
which
hopefully
you
guys
know
what
it
is.
A
It
was
sort
of
the
state
of
the
art
at
the
time
in
downscaling,
along
with
a
climate
informatics
2019
paper
by
banyo
medina,
that
used
a
convolutional
neural
network,
so
it
was
still
supervised
in
that
you
needed
paired
images,
so
we
can
do
kind
of
tests
on
a
held
out
data
set
that
you
know.
None
of
the
methods
were
trained
on
of
paired
images,
and
you
know
our
technique
didn't
do
much
worse
than
either
of
them.
A
We
wouldn't
expect
it
to
do
better,
because
the
unsupervised
task
is
generally
harder,
but
it's
pretty
comparable
and
then
we
don't
really
have
a
quantitative
way
of
showing
it,
but
where
this
shines
right
is
that
it
can
be
oh
and
we
did
a
whole
bunch
of
climb
decks
indices.
This
is
this
was
just
one
of
the
tables
in
brian's
thesis,
so
I
would
refer
you
to
that
or
to
the
archive
or
the
climate
informatics
proceedings.
A
But
how
do
these
predictions
work?
The
the
the
distributional
aspect
of
the
model,
so
we
could
input
a
course
resolution
image
to
the
model
and
for
reference,
we're
testing
this
on
a
reference
set
where
we
have
a
paired
image,
and
so
this
happened
to
be
the
paired
image
from
wharf
at
the
fine
grained
resolution.
This
is
not
given
to
the
model
and
then
on
the
right
in
in
shaded,
we
get
the
predicted
image
out
of
the
model,
but
actually
conditioned
on
this
input
image.
A
We
could
also
just
sample
a
whole
bunch
of
times
and
get
kind
of
a
distribution
over
fine-grained
predictions.
A
The
other
cool
thing
which
is
lifting
from
this
recent
idea
in
machine
learning,
is
that
of
interpolation
within
the
latent
space.
So
I
said
that
this
technique
is
symmetric
in
that
you
could
either
upscale
or
down
scale.
So
in
this
case
we
start
from
some
nwp
data,
so
some
worf
images,
the
one
at
the
far
left
top
and
far
right
top
are
in
our
data
set
and
everything
else
is
generated
by
the
model.
A
So
how
does
this
work?
Well,
we
input
this
image
map
it
to
the
latent
space
and
then
we
can
output
from
that
late
that
point.
In
latent
space
we
can
output
a
core
screen
image.
Similarly,
with
this
endpoint,
then
we
can
also
take
a
walk
in
latent
space
and
now
there's
a
whole
literature
and
machine
learning
on
how
to
do
this,
so
how
to
interpolate
in
your
along
the
distribution
that
you've
learned
over
latent
space.
A
And
so
you
know
these
two
wharf
images
were,
you
know,
simulating
precipitation
at
different
times,
so
we
could
actually
view
this.
As
some
you
know:
ai
conjectured,
temporal
interpolation
of
my
precipitation
map
at
fine
scale
and
at
coarse
scale.
A
So
that's
interesting
and
you
guys
can
tell
us
how
interesting
it
is,
but
it's
something
that
the
model
can
do
and
and
so,
as
I
mentioned,
you
know,
when
you
input
this
sample,
then
you're
getting
a
conditional
sample
at
the
core
scale.
You
could
also
sample
unconditionally,
so
you
could
just
sample
from
your
distribution
over
the
latent
space
and
then
generate
the
the
maps
at
higher
low
res
from
those
samples.
A
So
I
think
what
I'll
do
is
so
I
had
kind
of
two
possible
places
to
end,
depending
on
how
much
discussion
we
wanted
to
have
so
I'll,
say
some
kind
of
conclusory
and
outlook
type
statements
and
then
see
how
much
people
want
to
discuss
versus
hear
about
the
avalanche
project.
A
But
we
are
getting
interested
in
extremes.
We've
done
work
unsupervised
work
just
simply
on
how
do
you
represent
multivariate
extreme
events?
How
do
you
learn
them
from
data
in
an
unsupervised
way
in
a
soft
probabilistic
way,
where
you're
looking
at
multiple
variables
time
simultaneously,
so
different,
relative
and
specific
humidities
and
temperatures
and
pressures
and
your
method
can
detect
multiple
event
types
unlike
something
like
the
palmer
drought
severity
index?
That's
really
only
looking
for
one
event:
type:
drought.
A
We've
done
some
work
on
hurricane
track
forecasting,
where
we
would
look
at
track
data
track
history
and
then
also
spatiotemporal
fields
around
like
the
storm
center
from
the
track,
and
we
were
just
looking
at
temperature
and
pressure
for
some
of
these
studies
and
got
actually
comparable
performance
to
state
of
art
at
the
time
at
the
national
hurricane
center
avalanche
detection,
which,
if
I
have
time,
I
can
discuss
a
little
more,
both
with
supervised
learning
and
then
with
an
auto
encoder
and
then
a
semi-supervised
pipeline
and
then
other
work
around
extremes,
both
dry
and
wet
extremes
from
precipitation
in
monsoon,
and
I
was
also
recently
asked-
and
this
might
be
kind
of
a
good
conversation,
starter
or
outlook
slide.
A
What
I
see
as
sort
of
bottlenecks
or
challenges
in
climate
science.
Now
you
know
some
of
the
challenges.
A
A
Learning
and
dimensionality
reduction
is
in
large
part
that
the
amount
of
labeled
data,
so
data,
where
it's
paired
with
some
ground
truth
like
target
that
you
would
like
your
model
to
be
able
to
predict,
is
very
limited
compared
typically
compared
to
the
dimension
of
the
data
and
that
trade-off
is
important
in
machine
learning.
So
if
you
have
high
dimensional
data,
you'll
need
more
labeled
data
to
do
supervised
learning.
So
that's
why
we're
turning
to
unsupervised
techniques,
class
imbalance?
A
You
know,
that's
why
we're
looking
more
at
anomaly,
detection
type
techniques,
and
so
some
of.
B
A
We've
we've
addressed
or
were
addressing
using
those
first
two
bullets,
but
one
thought
I've
been
having
is
that,
just
in
general,
we're
in
the
measurement
era
now,
but
that
only
goes
back
so
far
in
time,
not
not
very
long
in
time,
and
we
don't
have
a
counter
factual
world
right.
A
So
we've
only
observed
a
time
series
once
per
location,
but
can
we
actually
substitute
the
diversity
and
granularity
that
we
might
need
in
our
data
to
train
machine
learning
from
the
temporal
regime
regime
where
we're
kind
of
poor
in
data
to
the
spatial
regime?
Where
we're
very
data
rich
these
days?
A
So
there's
issues
of
skill
resolution.
I
just
talked
about
downscaling
and
there's
no
reason
this
can't
apply
to
other
spatiotemporal
fields
that
you're
interested
in.
But
I
know
that
modelers
are
hoping
that
machine
learning
can
weigh
in
on
some
of
these
parameterization
questions.
Where
you
know,
moist
processes
can't
be
directly
modeled,
and
I
don't
have
too
much
to
say
about
that.
I
think
you
know
people
at
your
own
institution,
like
john
gagne
and
others
are
looking
into
that.
B
A
Around
climate
informatics
so
around
improving
the
ensemble
prediction,
making
it
adaptive
to
both
space
and
time
and
using
algorithms
that
learn
actually
the
level
of
non-stationarity
in
both
space
and
time,
while
making
these
adaptive,
ensembles
and
and
then
of
course,
interpretability.
A
So
I
think
interpretability
has
informed
all
our
work,
but
there's
really
a
long
ways
to
go
with
that,
and
there
are
people.
Now
I
mean
there's
this
new
ai
institute
on
trustworthy
machine
learning
for
weather
and
climate
m.a
ebert
upoff,
who
you
may
know,
is
involved
with
that,
and
it's
led
by
amy,
mcgovern
and
so
they're
directly
trying
to
say
how
do
we
make
an
ai
driven
model
or
forecast
that
can
be
directly
interpretable
by
humans
and
communities
etc?
A
B
This
is
all
great
so
far,
if
folks
want
to
weigh
in
in
the
chat,
if
you
want
to
hear
about
the
avalanche
work
or
if
you
have
questions,
we
can
just
jump
right
into
questions
from
the
audience,
so
you
can
type
those
in
the
chat
or
you
can
use
the
raise
hand
feature
which
is
in
the
reactions
pane
at
the
bottom
of
the
zoom
screen,
and
then
ask
your
question
out
loud
and
feel
free
to
also
turn
your
cameras
on
if
you'd
like,
so
we
can
make
it
a
little
bit
more
interactive.
B
So
while
we
wait
for
some
some
questions,
I
guess
claire
coming
off
of
your
last
point
there
on
interpretability.
Do
you
have
any
interpretability
examples
that
you
can
talk
about
for
your
work,
especially
you
know?
Maybe
some
of
the
downscaling
work
or
other
projects
that
your
group
is
working
on.
A
Let's
see,
I
did
have
something
to
say
about
that.
I
just
want
to
be
able
to
see
my
screen
again.
A
Okay,
so
you
can
still
see
my
shared
screen,
but
I
just
want
to
check.
I
had
some
thoughts
about
that
recently.
A
Yeah,
because
I'm
I'm
secretly
looking
at
something
else,
what
can
I
say
about
interpretability.
A
Oh
okay,
so
one
one
thing,
one
way
that
generative
learning
has
been
used
and
actually
more
so
generative
adversarial
networks
is
to
generate.
You
know
possible
instantiations
of
things
and
I've
seen
this
both
in
your
community
for
precipitation
maps
and
clouds
are
two
examples
where
I've
seen
people
give
talks
at
climate
informatics
events-
and
I
I
love
these-
these
images
that
they
generate
and
these
clouds
that
they
generate,
but
in
at
least
several
of
these
talks,
I've
seen
so
I
guess
we
can
go
back
to
the
interpretability
slide.
A
A
How
do
I
evaluate
if
the
distribution
over
the
cloud
images
that
I've
generated
is
good?
Like
does
it
match
the
distribution
over
cloud
images
in
nature
and
I've
sort
of
chuckled,
because
this
is
actually
really
an
area
of
active
research
in
ai
itself
in
interpretable
ai,
so
for
gans
in
general,
whether
or
not
you're
applying
to
them
to
clouds?
If
you're
applying
them
to
celebrity
faces,
or
what
have
you
now,
you've
got
this
generative
model
that
can
give
you
you
know
distributions
over
images.
A
A
And
it's
really
almost
philosophical
and
certainly
non-trivial,
because
if
you
had
a
formal
description
of
the
true
distribution
like
a
generative
model
for
the
true
distribution,
you'd
be
done,
you
wouldn't
even
be
learning
again,
and
so
so
in
my
group
on
the
theoretical
side,
we're
we're
looking
at
you
know,
formal
ways
to
evaluate
generative
models
such
as
as
gans
no
interesting
results
to
report
yet
other
than
gans
can't
count
is,
is
one
you
know
preliminary
experimental
finding
but
yeah.
It's
interesting.
C
Hi
thanks
for
really
interesting
talk.
I
guess
I
was
wondering
if
you
could
discuss
a
bit
more
of
the
idea
of
kind
of
borrowing
in
space
to
substitute
for
short
records
in
time.
I
guess
I
have
this
notion
of
kind
of
like
trying
to
do
a
skin
graft,
maybe
of
one
part
of
the
world
like
onto
another,
but
it
seems
like
you'd
have
to
you
know
pick
which
places
you
could
use
to
substitute
yeah.
I
don't
know
if
you
could
just
talk
about
that.
A
little
bit.
A
Yeah,
I
guess
right
so
there
are
these
canonical
examples
in
ai,
where
you
can
have
training
data
that
learns
a
perfect
classification
between
an
actual
wolf
versus
a
husky
dog.
A
But
then,
if
you
look
at
the
interpretation
of
what
part
of
the
images
you
were
looking
at,
they
were
just
looking
at
the
backgrounds
and
the
wolves
were
only
pictured
in
snow
in
the
training
data
set
and
the
huskies
were
pictured
on
grass,
for
example,
so
data
diversity
over
your
whole
feature
space
is
just
critical
for
machine
learning,
and
so
my
point
is
that
for
for
for
most
things,
you
know
our
only
our
data
only
goes
back
a
few
years
in
time.
A
Now,
if
you're
doing
something
like
real
time,
power,
output,
forecasting
of
solar
or
wind-
where
you
want
to
predict
on
minutes
down
to
seconds,
then
actually
you
have
plenty
of
of
data,
but
for
a
lot
of
you
know,
climate-based
things
where
you
care
about
monthly
or
decadal
or
annual.
We
really
have
a
very
limited
time
scale.
So
I
guess
what
I
was
more
thinking
is:
we
need
to
add
diversity
and
granularity
in
our
data
to
get
robust
models.
A
A
There
may
have
been,
you
know,
different
laws
or
different
activities
in
different
regions,
and
so
now
you
can
get
time
series
data.
You
can
kind
of
simulate
the
counter
factual
via
diversity
over
space.
Of
course,
you'd
have
to
control
for
all
the
other
confounds
about
how
these
geographic
locations
differ.
B
Time,
okay,
great
jerry,
go
ahead;
yeah
hi
claire!
That
was
really
interesting.
Talk,
I'm
gonna!
Ask
you
a
very
naive
question.
You
see
the
supervised
learning
part
because
you've
got
a
course
grid
that
you're
putting
in,
and
you
know,
kind
of
what
you
have
information
about
the
time
the
space
scales
you're
getting
out,
but
the
unsupervised
part
you're
giving
it
this
say.
One
degree
data
that
presumably
doesn't
know
anything
about.
B
What's
going
on
on
the
scale
of
convective
organization,
for
example,
so
how
what's
it
picking
up
on
in
the
one
degree,
data
that
would
actually
produce
convective
organization
at
you
know
like
at
the
10
kilometer
level,
I
mean
it
it.
It
seems
like
you're
getting.
It
seems
like
it's
kind
of
magic,
and
it
must
be
picking
up
on
something
that
it's
that
the
input
in
the
input
data
that
tells
it
that
there's
something
at
the
sub
grid
scale
that
it's
going
to
give
you
in
the
output.
A
The
the
only
quote,
unquote
magic
is
that
the
images
so
the
predictions
of,
or
whatever
the
the
data
output
by
era
versus
the
data
output
by
worf
at
at
the
1
degree
resolution
or
the
1
8
degree
resolution
they're
registered
to
the
same
geographical
bounding
box.
A
A
It's
literally
just
there
should
be
some
shared
geographical
structure
because
they
are
registered
to
be
aligned
on
the
same
bounding
box
and
then
you're,
seeing
many
instances
at
low
and
many
instant,
at
course,
resolution
and
many
instances
at
high
resolution.
But
I
mean
that
I
think
the
frustrating
part
in
terms
of
interpretability
of
many
of
these
deep
methods
is
that
we
can't
explicitly
point
to
or
even
believe
that
anything
was
learned
really
about
the.
A
There
is
there
are
people
bringing
physics
in
to
constrain
deep
learning
in
various
ways.
So
when
I
wrote
down
that
loss
function,
you
could
have
in
your
loss
function
some
encoding
of
a
physical
law,
for
example.
A
We
would
call
that
a
regularizer,
so
I
saw
anish
subermanian
is
on
the
call
he's
been
involved
with
that
kind
of
work.
I
believe
dj
gagne
has
been
involved
with
that
kind
of
work
as
well.
B
Thanks
claire
yeah,
we
still
have
time
for
questions
so
feel
free
to
folks
feel
free
to
jump
in.
I
guess
maybe
I'll
ask
kind
of
maybe
coming
off
of
jerry's
question.
B
The
I
think
will
be
interesting
for
folks
to
hear
about
more
is
the
physical
interpretation
of
the
latent
space,
and
I
know
this
is
something
that
you
and
I
have
talked
about.
I
think,
but
what
I
hadn't
seen
was
the
the
part
about
taking
a
walk
through
the
latent
space
and
extrapolating
through
time.
So
I
found
that
to
be
an
interesting
aspect
of
potentially
applying
how
you
might
use
the
information
from
this
latent
space
to
inform
you
know
other
extrapolations.
A
Sure
it's
definitely
I
mean
this
is
another
instance
where
the
interpretability
that
would
be
useful
in
climate
and
meteorology,
it
would
would
be
useful.
You
know
in
ai
in
general
and
so
interpretability
and
you
know
how
to
traverse
the
latent
space
are
really
bleeding
edge
topics.
Right
now,
in
in
straight
ai,
core
ai,
there
I've
seen
ai
papers
coming
out,
saying
that
you
know
the
meaningful.
A
So
now
you've
learned
your
posterior,
you
started
with
some
silly
prior
or
very
you,
either
uniform
or
isotropic
gaussian
prior
then
you've
done
your
normalizing
flow,
so
you've
gotten
a
much
more
informative
distribution
over
your
latent
space.
A
How
can
you
interpret
what's
there
or
how
could
you
traverse
it
so
that
an
interpolation
that
you
make
between
two
known
data
points
has
the
correct
interpretation
and
so
papers
are
coming
out?
That's
saying
that
you
should
follow
a
geodesic
path
along
in
your
probability,
so
it's
kind
of
like
you
know,
you're
you're,
all
very
familiar
with
geodesic
paths
on
the
earth
or
on
a
topographical
map.
Now
the
the
topographical
map
is,
you
know
our
level
sets
of
probability
in
the
posterior
distribution
that
we've
learned,
and
so
what
happens?
A
So
there
are
these
approximation
techniques
where
you'll
make
maybe
a
linear
approximation
locally,
and
then
you
know
snap
back
to
a
geodesic,
so
those
sorts
of
iterative
techniques
that
are
trying
to
weigh
against
the
faithfulness
of
the
path
to
the
geodesic
of
the
learned
posterior
as
opposed
to
computation
time
but
yeah.
I
have
had
people
scientists
challenge
me
and
say
you
know.
You
can't
really
say
that
that's
a
temporal
inter
interpolation
and
right
like
that
again
gets
into
interpretability.
A
All
I
meant
was
that
we
know
these
simulations
were
representing
precip
at
fine
grained
resolution
at
two
different
time
points
by
our
nwp
right,
and
so,
if
we
find
a
path
connecting
them
in
latent
space,
I
mean
subject
to
the
quality
of
the
path
right.
Ideally,
it's
geodesic
on
the
learn
distribution.
A
I'm
not
sure
in
terms
of
actually
visualizing
the
latent
space.
I
would
also
point
people
to
work
by
m.a,
ebert,
upoff
and
libby
barnes
at
csu,
where
I
think
not
this
agu,
but
the
previous
one.
A
They
they
have
done
stuff
around
visualizing
latent
spaces.
I
believe,
and
they're
really
trying
to
kind
of
like
translate
these
models
and
make
them
more
interpretive.
Interpretable
in
domain.
C
Thanks
yeah,
I
guess
another
question.
I
was
really
excited
to
see
your
description
of
the
domain
alignment
problem,
which
I
just
hadn't
heard
described.
That
way,
I
guess
it's
you
know
related.
It
could
maybe
maybe
be
related
to
a
prediction
problem
where
you
have.
Maybe
you
know
a
biased
climate
model
in
terms
of
making
climate
predictions
and
you're
trying
to
look,
maybe
for
a
mapping
to
get
from
that
model
to
something
I
guess
my
question
is
about
the
posedness
of
the
problem
of
finding
f.
C
A
A
So
if
you
had
two
random
variables
in
nature
x
and
y,
like
I
mean
we
were
hoping,
that
would
be
the
precipitation
field,
for
example
at
high
and
low
resolution,
then
it's
safe
to
say
that
you
can
get,
or
maybe
slightly
more
plausible,
to
say
that
you
can
get
access
to
iid
samples
from
their
marginal.
A
A
Just
you
know
one
parameter,
then
what
range
should
that
parameter
be
in
you've
made
already
a
decision
and
then
what
distribution
over
that
range?
Are
you
sampling
from
when
you
run
your
ensemble
of
runs,
so
I
think
it
would
be
promising.
A
I
think
there
could
be
some
cool
research
there
and
I'd
be
really
happy
to
chat
more
what
I'm
seeing
as
the
hard,
whether
you
want
to
call
it
a
statistics,
problem
or
philosophical
problem
is:
how
do
you
I
mean
where's,
the
random
variable,
I
mean
there's
there
were
a
lot
of
decisions
and
sort
of
engineering
things
behind.
You
know
a
climate
model.
I
I
think
is
not
really
a
random
variable.
You
might
be
able
to
argue
otherwise
or
given
that
it
might
not
be.
A
How
could
we
sample
from
it
or
run
it
randomly
in
different
ways,
because
while
you
don't
need
paired
data,
you
are
relying
on
the
iid
assumption
of
access
to
the
marginal
of
each
of
the
two
variables.